Polystore Query Rewriting: The Challenges of Variety

نویسنده

  • Yannis Papakonstantinou
چکیده

Numerous databases marketed as SQL-on-Hadoop, NewSQL [16] and NoSQL have emerged to catalyze Big Data applications. These databases generally support the 3Vs [7]. (i) Volume: amount of data (ii) Velocity: speed of data in and out (iii) Variety: semi-structured and heterogeneous data. As a result of differing use cases and design considerations around the Variety requirement, these new databases have adopted semi-structured data models that vary among each other. Their query languages have even more variations. Some variations are due to superficial syntactic differences. Some variations arise from the data model differences. Other variations are genuine differences in query capabilities. Yet another kind of variations involves subtly different semantics for seemingly similar query functionalities. E.g., equality may have subtle and unexpected meanings in the presence of missing attributes in NoSQL databases. Even in a single organization, it is common to find multiple databases that exhibit high variety. Often applications require integrated access to those databases. It is difficult to write optimized software that retrieves data from multiple such databases, given the different data models, different query syntaxes and the (often subtly) different query semantics. This problem has been recognized for many decades in the database community. It is now accentuated, as a plethora of different and specialized databases finds its place in the enterprise. For example, the problem happens whenever an enterprise adopts a fast and scalable NoSQL database to capture its users’ activity on its web site (web log) and then builds applications that need integrated access to the web log data stored in the NoSQL database and also to data in its existing SQL databases.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

BigDAWG Polystore Release and Demonstration

The Intel Science and Technology Center for Big Data is developing a reference implementation of a Polystore database. The BigDAWG (Big Data Working Group) system supports “many sizes” of database engines, multiple programming languages and complex analytics for a variety of workloads. Our recent efforts include application of BigDAWG to an ocean metagenomics problem and containerization of Big...

متن کامل

Interoperability in Peer Data Management Systems∗

Interoperability plays an important role for a variety of applications. One of them are Peer Data Management Systems, where autonomous data sources (peers) interact with each other based on semantic mappings between their schemas. The building blocks that enable interoperability and thus the main challenges in such systems are mapping representation, query rewriting, and efficient query process...

متن کامل

Query Rewriting Under Ontology Evolution

One of the most prominent reasoning techniques for query answering is query rewriting. The last years a wide variety of query rewriting systems has been proposed. All of them accept as input a CQ Q and a fixed ontology O and produce a rewriting for Q, O. However, in many real world applications ontologies are very often dynamic—that is, new axioms can be added or existing ones removed frequentl...

متن کامل

Light-weight Domain-based Form Assistant: Querying Databases on the Web

The Web has been rapidly “deepened” by myriad searchable databases online, where data are hidden behind query forms. Helping users query alternative “deep Web” sources in the same domain (e.g., Books, Airfares) is an important task with broad applications. As a core component of those applications, dynamic query translation (i.e., translating a user’s query across dynamically selected sources) ...

متن کامل

Benchmarking Ontology-Based Query Rewriting Systems

Query rewriting is a prominent reasoning technique in ontology-based data access applications. A wide variety of query rewriting algorithms have been proposed in recent years and implemented in highly optimised reasoning systems. Query rewriting systems are complex software programs; even if based on provably correct algorithms, sophisticated optimisations make the systems more complex and erro...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016